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A data tree is an unranked ordered tree whose every node is labelled by a letter from a finite 
alphabet and an element ("datum") from an infinite set, where the latter can only be compared 
for equality. The article considers alternating automata on data trees that can move downward 
and rightward, and have one register for storing data. The main results are that nonemptiness over 
finite data trees is decidable but not primitive recursive, and that nonemptiness of safety automata 
is decidable but not elementary. The proofs use nondeterministic tree automata with faulty 
counters. Allowing upward moves, leftward moves, or two registers, each causes undecidability. As 
corollaries, decidability is obtained for two data-sensitive fragments of the XPath query language. 

Categories and Subject Descriptors: F.4.1 [Mathematical Logic and Formal Languages]: 
Formal Languages — Decision problems; F.l.l [Computation by Abstract Devices]: Models 
of Computation — Automata; H.2.3 [Database Management]: Languages — Query languages 

General Terms: Algorithms, Verification 



1. INTRODUCTION 

Context. Logics and automata for words and trees over finite alphabets are rel- 
atively well-understood. Motivated partly by the search for automated reasoning 
techniques for XML and the need for formal verification and synthesis of infinite- 
state systems, there is an active and broad research programme on logics and au- 
tomata for words and trees which have richer structure. 

Initial progress made on reasoning about data words and data trees is summarised 
in the survey by Segoufin [2006]. A data word is a word over SxP, where E is 
a finite alphabet, and T> is an infinite set ("domain") whose elements ("data") can 
only be compared for equality. Similarly, a data tree is a tree (countable, unranked 
and ordered) whose every node is labelled by a pair iiiExD. 

First-order logic for data words was considered by Bojahczyk et al. [2006], and 
related automata were studied further by Bjorklund and Schwentick [2007]. The 
logic has variables which range over word positions ({0, ...,/ — 1} or N), a unary 
predicate for each letter from the finite alphabet, and a binary predicate x ~ y 
which denotes equality of data labels. F0 2 (-fT, <, ~) denotes such a logic with two 
variables and binary predicates x + 1 = y and x < y. Over finite and over infinite 
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data words, satisfiability for F0 2 (+1,<,^) was shown decidable and at least as 
hard as nonemptiness of vector addition automata. Whether the latter problem is 
elementary has been open for many years. Extending the logic by one more variable 
causes undecidability. 

Over data trees, F0 2 (+1, <, ~) denotes a similar first-order logic with two vari- 
ables. The variables range over tree nodes, +1 stands for two predicates "child" and 
"next sibling" , and < stands for two predicates "descendant" and "younger sibling" . 
Complexity of satisfiability over finite data trees was studied by Bojahczyk et al. 
[2009]. For F0 2 (+1, ~), it was shown to be in 3NExpTime, but for F0 2 (+1, <, ~), 
to be at least as hard as nonemptiness of vector addition tree automata. Decid- 
ability of the latter is an open question, and it is equivalent to decidability of 
multiplicative exponential linear logic [deGroote et al. 2004]. However, Bjorklund 
and Bojahczyk [2007] showed that F0 2 (+1, <, ~) over finite data trees of bounded 
depth is decidable. 

XPath [Clark and DeRose 1999] is a prominent query language for XML docu- 
ments [Bray et al. 1998]. The most basic static analysis problem for XPath, with 
a variety of applications, is satisfiability in the presence of DTDs. In the two ex- 
tensive articles on its complexity [Benedikt et al. 2008; Geerts and Fan 2005], the 
only decidability result that allows negation and data (i.e., equality comparisons 
between attribute values) does not allow axes which are recursive (such as "self or 
descendant") or between siblings. By representing XML documents as data trees 
and translating from XPath to F0 2 (+1,^), Bojahczyk et al. [2009] obtained a 
decidable fragment with negation, data and all nonrecursive axes. Another frag- 
ment of XPath was considered by Halle et al. [2006], but it lacks concatenation, 
recursive axes and sibling axes. A recent advance of Figueira [2009] shows Exp- 
TiME-completeness for full downward XPath, but with restricted DTDs. 

An alternative approach to reasoning about data words is based on automata 
with registers [Kaminski and Francez 1994]. A register is used for storing a datum 
for later equality comparisons. Nonemptiness of one-way nondeterministic regis- 
ter automata over finite data words has relatively low complexity: NP-completc 
[Sakamoto and Ikcda 2000] or PSPACE-complete [Demri and Lazic 2009], depending 
on technical details of their definition. Unfortunately, such automata fail to pro- 
vide a satisfactory notion of regular language of finite data words, as they are not 
closed under complement [Kaminski and Francez 1994] and their nonuniversality 
is undecidable [Neven et al. 2004]. To overcome those limitations, one-way alter- 
nating automata with 1 register were proposed by Demri and Lazic [2009]: they 
are closed under Boolean operations, their nonemptiness over finite data words is 
decidable, and future-time fragments of temporal logics such as LTL or the modal 
/^-calculus extended by 1 register are easily translatable to such automata. However, 
the nonemptiness problem over finite data words turned out to be not primitive 
recursive. Moreover, already with weak acceptance [Muller et al. 1986] and thus 
also with Buchi or co-Buchi acceptance, nonemptiness over infinite data words is 
undecidable (more precisely, co-r.e.-hard). When the automata are restricted to 
those which recognise safety properties [Alpern and Schneider 1987] over infinite 
data words, nonemptiness was shown to be ExpSPACE-complete, and inclusion to 
be decidable but not primitive recursive [Lazic 2006] . 
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Contribution. This article addresses one of the research directions proposed by 
Segoufin [2006]: investigating modal logics and automata with registers on data 
trees. Nondeterministic automata with registers which can be nondeterministi- 
cally reassigned on finite binary data trees were recently studied by Kaminski and 
Tan [2008]: top-down and bottom-up variants recognise the same languages, and 
noncmptiness is decidable. However, they inherit the drawbacks of one-way non- 
deterministic register automata on data words: lack of closure under complement 
and undecidability of nonuniversality. 

We consider alternating automata that have 1 register and are forward, i.e., can 
move downward and rightward over tree nodes: for short, ATRAi. They are closed 
under Boolean operations, and we show that their nonemptiness over finite data 
trees is decidable. Moreover, forward fragments of CTL and the modal /i-calculus 
with 1 register are easily translatable to ATRAi [Jurdzihski and Lazic 2007]. The 
expressiveness of ATRAi is incomparable to those of F0 2 (+1, ~) and the automata 
of Kaminski and Tan [2008]: for example, the latter two formalisms but not ATRAi 
can check whether some two leaves have equal data, and the opposite is true of 
checking whether each node's datum is fresh, i.e., does not appear at any ancestor 
node. By lower-bound results for register automata on data words in [Neven et al. 
2004; David 2004; Demri and Lazic 2009], we have that ATRAi nonemptiness is not 
primitive recursive, and that it becomes undecidable (more precisely, r.e.-hard) if 
any of the following is added: upward moves, leftward moves, or one more register. 

Motivated partly by applications to XML streams (cf., e.g., [Olteanu et al. 2004]), 
we consider both finite and countably infinite data trees, where horizontal as well as 
vertical infinity is allowed. For ATRAi with the weak acceptance mechanism, the 
undecidability result over infinite data words [Demri and Lazic 2009] carries over. 
However, we show that, for safety ATRAi, which are closed under intersection and 
union but not complement, inclusion is decidable and not primitive recursive. When 
a data tree is rejected by an automaton with the safety acceptance mechanism, there 
exists an initial segment whose every extension is rejected. We also obtain that 
nonemptiness of safety ATRAi is not elementary. The latter is the most surprising 
result in the article: it means that the techniques in the proof that nonemptiness 
over infinite data words of safety one-way alternating automata with 1 register is 
in ExpSpace cannot be lifted to trees to obtain a 2ExpTime upper bound. 

The proofs of decidability involve translating from ATRAi to forward nondeter- 
ministic tree automata with faulty counters. The counters are faulty in the sense 
that they are subject to incrementing errors, i.e., can spontaneously increase at any 
time. That makes the transition relations downwards compatible with a well-quasi- 
ordering (cf. [Finkel and Schnoebelen 2001]), which leads to lower complexities of 
some verification problems than with error-free counters. 

We define forward XPath to be the largest downward and rightward fragment in 
which, whenever two attribute values are compared for equality, one of them must 
be at the current node. By translating from forward XPath to ATRAi, we obtain 
decidability of satisfiability over finite documents and decidability of satisfiability 
for a safety subfragment, both in the presence of DTDs. In contrast to the decidable 
fragments of XPath mentioned previously, forward XPath has sibling axes, recursive 
axes, concatenation, negation, and data comparisons. 
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2. PRELIMINARIES 

After fixing notations for trees and data trees, we define two kinds of forward au- 
tomata and look at some of their basic properties: alternating automata with 1 
register on data trees, and nondeterministic automata with counters with incre- 
menting errors on trees. 

2.1 Trees and Data Trees 

For technical simplicity, we shall work with binary trees instead of unranked ordered 
trees. Firstly, as e.g. Bjorklund and Bojahczyk [2007], we adopt the insignificant 
generalisation of considering unranked ordered forests, in which the roots are re- 
garded as siblings with no parent. Secondly, the following is a standard and trivial 
one-to-one correspondence between unranked ordered forests t and binary trees 
bt(i): the nodes of bt(£) are the same as the nodes of t, and the children of each 
node n in bt(i) are the first child and next sibling of n in t. The correspondence 
works for finite as well as infinite unranked ordered forests. In the latter, there may 
be infinite (of type u>) branches or siblinghoods or both. 

Without loss of generality, each node will either have both children or be a leaf, 
only nonleaf nodes will be labelled, and the root node will be nonleaf. Formally, a 
tree is a tuple (N, E, A), where: 

— N is a prefix-closed subset of {0,1}* such that |AT| > 1 and, for each n G N, 
either n - e N and n-1 e N, or n ■ N and n ■ 1 <£ N; 

— E is a finite alphabet; 

— A is a mapping from the nonleaf elements of N to E. 

A data tree is a tree as above together with a mapping A from the nonleaf nodes 
to a fixed infinite set V. For a data tree r, let tree(r) denote the underlying tree. 

For a data tree r and I > 0, let the l-prefix of r be the data tree obtained by 
restricting r to nodes of length at most I. For each E, the set of all data trees with 
alphabet E is a complete metric space with the following notion of distance: for 
distinct r and t', let d(r, r') = l/l where I is least such that r and r' have distinct 
Z-prcfixes. 

2.2 Alternating Tree Register Automata 

Automata. A run of a forward alternating automaton with 1 register on a data 
tree will consist of a configuration for each tree node. Each configuration will be 
a finite set of threads, which are pairs of an automaton state and a register value, 
where the latter is a datum from T>. 

Following Brzozowski and Leiss [1980], transitions will be specified by positive 
Boolean formulae. For a set of states Q, let B + (Q) consist of all formulae given by 
the following grammar, where q G Q: 

if ::= q(0,i) | g(0, I) | g(l,|) | q(l, I) T | J_ | if A <p | <p V <p 

Given a configuration G at a nonleaf tree node n, for each thread (q, D) in G, the 
automaton transition function provides a formula <p in B + (Q), which depends on 
q, on the letter labelling n, and on whether D — E, where E is the datum labelling 
n. In <p, an atom r(d, l) requires that thread (r, E) be in the configuration for node 
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n ■ d (i.e., the register value is replaced by the datum at n), and an atom r(d, I) 
requires the same for thread (r, D) (i.e., the register value is not replaced). 

Formally, a forward alternating tree 1-register automaton (shortly, ATRA\) A is 
a tuple (E, Q, qi, F, S) such that: 

— E is a finite alphabet and Q is a finite set of states; 

— qi G Q is the initial state and F C Q are the final states; 

— 5 : Q x E x {tt,ff} — > B + (Q) is a transition function. 

Runs and Languages. The semantics of the positive Boolean formulae can be 
given by defining when a quadruple Rq, Rq, R\, R{ of subsets of Q satisfies a for- 
mula ip in B + (Q), by structural recursion. The cases for the Boolean atoms and 
operators are standard, and for the remaining atoms we have: 



We can now define the transition relation of A, which is between configurations 
and pairs of configurations, and relative to a letter and a datum. We write G — >^ 



H , Hx iff, for each thread (q, D) G G, there exist R$, R\, r( |= 6(q, a,D = E) 
such that, for both d g {0, 1}: 



A run of A on a data tree (N, E, A, A) is a mapping n H» G n from the nodes to 
configurations such that: 

— the initial thread is in the configuration at the root, i.e. {qi, A(e)) g G £ ; 

— for each nonleaf n, the transition relation is observed, i.e. G n — ^(n) ^n-o> G n .\. 

We say that the run is: 

— final iff, for each leaf n, only final states occur in G„; 

— finite iff there exists I such that, for each n of length at least I, G n is empty. 

We may regard A as an automaton on finite data trees, a safety automaton, or 
a co-safety automaton. We say that: 

— A accepts a finite data tree r iff A has a final run on r; 

— A safety-accepts a data tree r iff A has a final run on r; 

— A co-safety-accepts a data tree r iff A has a final finite run on r. 

Observe that, for finite data trees, the three modes of A coincide. 

Let L-f m (A) denote the set of all finite data trees with alphabet E that A accepts, 
and h sa f {A) (resp., L cos (.A)) denote the set of all data trees with alphabet E that 
A safety-accepts (resp., co-safety-accepts). 

Remark 2.1. The valid initial and successor configurations in runs were defined 
in terms of lower bounds on sets. In other words, while running on any data tree, 
at each node the automaton is free to introduce arbitrary "junk" threads. However, 
final and finite runs were defined in terms of upper bounds on sets, so junk threads 
can only make it harder to complete a partial run into an accepting one. This will 
play an important role in the proof of decidability in Theorem 3.1. 



Ri,RlR\,R{:\=r(d,?) d 4 




{(r,E) :reR l d }U {(r,D) : r G R*} C H d 
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Boolean Operations. Given an ATRAi A, let A denote its dual: the automaton 
obtained by replacing the set of final states with its complement and replacing, in 
each transition formula 5{q,a,p), every T with _L, every A with V, and vice versa. 

Observe that A — A. Considering A (resp., A) as a weak alternating automaton 
whose every state is of even (resp., odd) parity, we have by [Loding and Thomas 
2000, Theorem 1] that L cos (_4) is the complement of L sa ^(A). Hence, we also have 
that L saf (A) is the complement of L cos (A), and that L fm (A) is the complement 
of V m (A). 

For each m of fin, saf, cos, given ATRAi Ai and A2 with alphabet S, an 
automaton whose language in mode m is L m (.4i)nL m („42) (resp., L m (^li)UL m (^42)) 
is constructible easily. It suffices to form a disjoint union of A\ and A2, and 
add a new initial state qi such that S(qi,a,tt) = 5{q],a,tt) A 5(qj,a,tt) (resp., 
5{qi, a, tt) = 5(qj,a, tt) V S(qj,a, tt)) for each a G £, where q) and qj are the initial 
states of A\ and Ai- (Since the initial thread's register value is always the root 
node's datum, the formulae 5(qi,a,ff) are irrelevant.) 

We therefore obtain: 

Proposition 2.2. (a) ATRAi on finite data trees are closed under complement, 
intersection and union, 
(b) Safety ATRAi and co-safety ATRAi are dual, and each is closed under inter- 
section and union. 

In each case, a required automaton is computable in logarithmic space. 

Safety Languages. A set L of data trees with alphabet E is called safety [Alpern 
and Schneider 1987] iff it is closed with respect to the metric defined in Section 2.1, 
i.e. for each data tree r, if for all Z > there exists r/ € L such that the Z-prefixes 
of r and t[ are equal, then r e L. The complements of safety languages, i.e. the 
open sets of data trees, are called co-safety. 

Proposition 2.3. For each ATRAi A, we have that L saf (A) is safety and 
L cos (A) is co-safety. 

PROOF. By Proposition 2.2(b), it suffices to show that L sa ^ (.4) is safety. Suppose 
for alH > there exists t[ e L sa J (A) such that the /-prefixes of r and t[ are equal. 

For each / > 0, let us fix a final run n M- G[ n of A on t[. For each < k < I, let 
Qi_k denote the restriction of the run n M> G\ n to nodes n of length k. 

Consider the tree consisting of the empty sequence and all sequences Qi,o ■ Qi,i ■ 
■■■Qi.k for I > and < k < I. Without loss of generality, each register value 
in each G\ n labels some node of t{ on the path from the root to n, so the tree is 
finitely branching. By Konig's Lemma, it has an infinite path Ho ■ Hi ■ ■ ■ ■ . For 
each < k, T-Lk is a mapping from the nodes of r of length k to configurations of 
A. It remains to observe that n H> H\ n \ (n) is a final run of A on t. □ 

Example 2.4. By recursion on k > 1, we shall define ATRAi &k with alphabet 
{£>!,...,&£,*}. As well as being interesting examples of ATRAi, the Bu will be used 
in the nonelementarity part of the proof of Theorem 4.1. 

Let Bi be the automaton depicted in Figure 1. It has three states, where q is 
initial, and q" is final. We have S(q,bi,p) = q'(0, I) A q"(l, I) and 5(q',bi,p) = 
q"(0, I) A q"(l, I) for both p G {tt,ff}, and the transition function gives _L in all 
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Fig. 1. Defining B\ 



1,1 



other cases. (Recalling that the initial thread's register value is the root node's 
datum, the formula 5(q, bi,ff) is in fact irrelevant.) Observe that B\ safety-accepts 
exactly data trees that have two nonlcaf nodes, the root and its left-hand child, 
and both are labelled by letter b\. 

For each k > 1, Bk+i is defined so that it safety-accepts a data tree over 
{&i,..., 6k+i,*} iff: 

(i) the root node is labelled by bk+i, its left-hand child is labelled by bk+i, and its 
right-hand child is a leaf; 

(ii) for each node n labelled by bk+i, which is not the root, the left-hand child of 
n is labelled by * and its both children are labelled by b^+i, and the right-hand 
subtree at n is safety-accepted by Bk\ 

(iii) whenever a node n, which is not the root, and a descendant n' of n are labelled 
by 6/c + i, we have that their data labels are distinct, and that the datum at n 
equals the datum at some node which is labelled by bk and which is in the right- 
hand subtree at n' . 

By Proposition 2.2(b), it suffices to define automata for (i)-(iii) separately. Ex- 
pressing (i) and (ii) is straightforward, and an automaton for (iii) is depicted in 
Figure 2. It has four states, where go is initial, and q\ and </2 are final. For all 
letters a and Booleans p, we have 5(qo,a,p) = <?i(0, I), so initially the automa- 
ton moves to the left-hand child of the root and changes the state to q\. From 
qi, if the current node is labelled by *, the automaton moves to both children: 
S(qi, *,p) = qi(0, 1) A <7i(l, I) for both p. Also from q\, if the current node n is 
labelled by bk+i, the automaton both moves to the left-hand child without chang- 
ing the state, and moves to the left-hand child with storing the datum at n in the 
register and changing the state to q-2.: 5(qi,bk+i,p) — <Zi(0, I) A 92(0, \) for both p. 
From <72, the behaviour for * is analogous to that from qi, but if the current node's 
letter is bk+i and its datum is distinct from the datum in the register, the automa- 
ton both moves to the left-hand child without changing the state and moves to the 
right-hand child with changing the state to q$: S(q2, bk+i,ff) — <Z2(0, I) A 93(1, I). 
The remainder of Figure 2 is interpreted similarly, and in cases not depicted, the 
transition function gives _L. Since the mode of acceptance is safety, the automaton 
in fact expresses: 

(iii') whenever a node n, which is not the root, and a descendant n! of n are labelled 
by bk+i, we have that their data labels are distinct, and that either the datum 
at n equals the datum at some node which is labelled by bk and which is in the 
right-hand subtree at n' , or that subtree is infinite. 

Let 2 fr = 1, and 2 ft k = 2 2l ^ fc - 1 ) for k > 1. By induction on k > 1, the safety 
language of Bk has the following two properties. In particular, in the presence of 
(i) and (ii), we have that (iii) and (iii') are equivalent. 
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Fig. 2. Defining B k+1 

— for every r safety-accepted by Bk, every downward sequence which is from the 
left-hand child of the root and which consists of nodes labelled by bk is of length 
at most 2 ft (fc — 1), so r is finite and has at most 2 ft k nodes labelled by b^; 

— for some r safety- accepted by Bk, the nodes labelled by bk other than the root 
form a full binary tree of height 2 ft (fc — 1) (after removing the nodes labelled 
by *), so there are 2 ft fc nodes labelled by bk, and moreover the data at those 
nodes are mutually distinct. 

Finally, we observe that for computing Bk, space logarithmic in fc suffices. 
2.3 Faulty Tree Counter Automata 

In Section 3, we shall establish decidability of nonemptiness of forward alternating 
tree 1-register automata over finite data trees, by translating them to automata 
which have natural-valued counters with increments, decrements and zero-tests. 
The translation will eliminate conjunctive branchings, by having configurations of 
the former automata (which are finite sets of threads) correspond to pairs of states 
and counter valuations, so the latter automata will be only nondeterministic. Also, 
data will be abstracted in the translation, so the counter automata will run on finite 
trees (without data). 

The feature that will make nonemptiness of the counter automata decidable (on 
finite trees) is that they will be faulty, in the sense that one or more counters 
can erroneously increase at any time. The key insight is that such faults do not 
affect the translation's preservation of nonemptiness: they in fact correspond to 
introductions of "junk" threads in runs of ATRAi (cf. Remark 2.1). 

For clarity of the correspondence between the finitary languages of ATRAi and 
the languages of their translations, the counter automata will have ^-transitions. 

We now define the counter automata, and show their nonemptiness decidable. 

Automata. An incrementing tree counter automaton (shortly, ITCA) C, which is 
forward and with £-transitions, is a tuple (S, Q, qi,F, fc, S) such that: 

— E is a finite alphabet and Q is a finite set of states; 

— qi € Q is the initial state and F C Q are the final states; 

— fc € N is the number of counters; 

— <5 C (Q x S x L x Q x Q) U (Q x {e} x L x Q) is a transition relation, where 
L = {inc, dec, if z} x {1, . . . , fc} is the instruction set. 
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Runs and Languages. A counter valuation is a mapping from {l,...,fc} to N. 
For counter valuations v and v' , we write: 

iff v(c) < v'(c) for all c 

iff v' = v[c M> v(c) + 1] 

iff v' = v[c M> v(c) — 1] 

iff v(c) = and v' — v 

iff v < — >y v'^j < v' for some v^, v'^ 

A configuration of C is a pair (q, v), where q is a state and v is a counter valuation. 

To define runs, we first specify that a block is a nonempty finite sequence of 
configurations obtainable by performing e-transitions, i.e. for every two adjacent 
configurations (qi,Vi) and {q%+\, in a block, there exists Z with e, Z, gi+i) £ 5 

and A Uj+i. 

Now, a run of C on a finite tree (N, E, A) is a mapping n ^ B n from the nodes 
to blocks such that: 

— (qi, 0} is the first configuration in B E ; 

— for each nonleaf n, there exists I with (q, A(n), I, r , ri) £ <5, u -4 wo and u -4 »i, 
where (q, u) is the last configuration in B n , and (ro, wo) and (n,«;i) are the first 
configurations in B n .o and B n .\ (respectively). 

We regard such a run accepting iff, for each leaf n, the state of the last configu- 
ration in B n is final. The language L(C) is the set of all finite trees with alphabet 
E on which C has an accepting run. 

Decidability of Nonemptiness. We remark that, since nonemptiness of increment- 
ing counter automata over words is not primitive recursive [Demri and Lazic 2009, 
Theorem 2.9(b)], the same is true of nonemptiness of ITCA. 

Theorem 2.5. Nonemptiness of ITCA is decidable. 

PROOF. Consider an ITCA C = (E, Q, <?/, F, k, S). 

For counter valuations v and v', and an instruction I, we say that v under I yields 

v' lazily and write v — >\, v' iff either v — >y v' (i.e., there are no incrementing errors), 
or / is of the form (dec,c), v(c) — and v' — v (i.e., is erroneously decremented 
to 0). Observe that: 

(*) Whenever v < w and w -4 w' , there exists v' such that v -4^ v' and v' < w'. 

To reduce the nonemptiness problem for C to a reachability problem, let a level 
of C be a finite set of configurations. For levels Q and Q' of C, let us write Q — > Q' 
iff Q' can be obtained from Q as follows: 

— each (q, v) 6 Q with q ^ i 7, is replaced either by the two configurations that some 
firablc transition (q, a, I, ro, r\) yields lazily, or by the one configuration that some 
Arable transition (q,e,l,r) yields lazily; 

— each (q, v) E Q with q e F is removed. 
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v ' — >^> v 

(dec.c) 
V ' — >^r V 

(ifz.c) 
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v — > V 



10 • M. Jurdziriski and R. Lazic 



Performing transitions of C lazily ensures that, for every level Q, the set {Q' : 
Q G'} of all its successors is finite. The latter set is also computable. By the 
definition of accepting runs and (*), we have that C is nonempty iff the empty level 
is reachable from the initial level {{qi,0}}. 

For configurations (q,v) and (r,w), let (q,v) < (r,w) iff q = r and v < w. Now, 
let < be the quasi-ordering obtained by lifting < to levels: Q -< H iff, for each 
(q,v) £ G, there exists (r, w) £ % such that (q,v) < (r,w). By Higman's Lemma 
[Higman 1952], ^ is a well-quasi-ordering, i.e., for every infinite sequence Go, Gi, ■ ■ ., 
there exist i < j such that Gi d Gj- Observe that, in the terminology of Finkel and 
Schnoebelen [2001], d is strongly downward-compatible with — >•: whenever G d H 
and H — > H' , there exists G' such that G —> G' and G' ■ Also, < is decidable. 

Since G ~d iff G = 0, we have reduced nonemptiness of C to the subcovering 
problem for downward well-structured transition systems with reflexive (which is 
weaker than strong) compatibility, computable successor sets and decidable order- 
ing. The latter is decidable by [Finkel and Schnoebelen 2001, Theorem 5.5]. □ 

3. DECIDABILITY OVER FINITE DATA TREES 

Theorem 3.1. Nonemptiness of ATRAi over finite data trees is decidable and 
not primitive recursive. 

Proof. By considering data words as data trees (e.g., by using only left-hand 
children starting from the root), the lower bound follows from non- primitive recur- 
siveness of nonemptiness of one-way co-nondctcrministic (i.e., with only conjunctive 
branching) automata with 1 register over finite data words [Demri and Lazic 2009, 
Theorem 5.2]. 

We shall establish decidability by reducing to nonemptiness of ITCA, which is 
decidable by Theorem 2.5. More specifically, by extending to trees the translation in 
the proof of [Demri and Lazic 2009, Theorem 4.4] , which is from one-way alternating 
automata with 1 register on finite data words to incrementing counter automata 
on finite words, we shall show that, for each ATRAi A, an ITCA C4 with the 
same alphabet and such that L(C^) = {trec(r) : t £ lJ ln (A)}, is computable (in 
polynomial space). 

Let A = (£, Q, qi, F, S). For a configuration G of A and a datum D, let the bundle 
of D in G be the set of all states that are paired with D, i.e. {q : (q, D) £ G}. The 
computation of C4 with the properties above is based on the following abstraction of 
configurations of A by mappings from V(Q) \ {0} to N. The abstract configuration 
G counts, for each nonempty S C Q, the number of data whose bundles equal S: 

G(S) = \{D : {q: (q,D) E G} = S}\ 

Thus, two configurations have the same abstraction iff they are equal up to a 
bijective renaming of data. For 1 < i < G(S) and q £ S, we shall call pairs (5, i) 
abstract data and triples (q, S, i) abstract threads. 

For abstract configurations v, wo and w\, letters a, and sets of states Q = with 
either v(Q=) > or Q = = 0, we shall define transitions v — >®= wo,wi, and show 
that they are bisimilar to transitions G — Hq,Hi such that v = G, wo = Ho, 
w\ = Hi and Q = = {q : (q,E) £ G}. The sets Q = can hence be thought of as 
abstractions of the data E. The abstract transitions will then give us a notion of 
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abstract run of A on a finite tree (without data) , where the sets Q- are guessed at 
every step. By the bisimilarity, we shall have that: 

(I) A has an accepting abstract run on a finite tree t with alphabet £ iff it has an 
accepting run on some data tree r such that t = tree(r). 

In other words, we shall have reduced the question of whether \J m (A) is nonempty, 
i.e. whether there exists a finite tree with alphabet S, a data labelling of its nonlcaf 
nodes, and an accepting run of A on the resulting data tree, to whether there 
exists a finite tree and an accepting abstract run of A on it. It will then remain to 
show how to compute (in polynomial space) an ITCA C4 which guesses and checks 
accepting abstract runs of A, so that: 

(II) C4 has an accepting run on a finite tree t with alphabet £ iff A has an accepting 
abstract run on t. 

To begin delivering our promises, we now define transitions from abstract con- 
figurations v for letters a and sets of states Q = with either v(Q = ) > or Q = = 
to abstract configurations w and w\, essentially by reformulating the definition of 
concrete transitions (cf. Section 2.2) in terms of abstract threads. For each abstract 
datum (S, i) of v and both d G {0, 1}, the abstract threads whose abstract datum 
is (S, i) will contribute two sets of states to such a transition: R'(S,i) d , for which 
the automaton's register is updated, and R'(S,i) d , for which the automaton's reg- 
ister is not updated. If Q = is nonempty, we take (Q=, I) to represent the datum 
abstracted by Q = , i.e. with which the register is updated, so states in the union of 
the set R'(Q=, 1 ) j" and all the sets R'(S, i) d will be associated to the same abstract 
datum of Wd- Formally, let v — >J= w , w\ mean that, for each abstract datum (5, i) 
of v, there exist sets of states R'(S, i)^,R'(S, i)^, R'(S, R'(S, i)( such that: 

(i) for each abstract thread {q, S, i) of v, there exist 

R l , R$,R\, R( h 5(9, a, (S, i) = (Q = , I)) 

which satisfy R\ C R'(S, i)' d for both d G {0, 1} and ? e {I, I}; 

(ii) for both d £ {0, 1} and each nonempty S' C Q, we have 

\{(S,i) : (S,i)^{Q = A)A^(S,i)i = S'}\ + ^ t ^=^'^<w d {S') 
for some Rj 3 X(Q=,1)£ U Ui<i< w (S) ^W^- 

It is straightforward to check the following two-part correspondence between the 
abstract transitions just defined and concrete transitions: 

(Ilia) Whenever G — >f H ,Hi, we have v — >®= wo,w\, where v — G, w — H , 

tui = TI{ and Q = = {q : (q, E) e G}. 
(111b) Whenever G — v and v — >®= w 0} wi, there exist E, H and Hi such that 

G ->f H ,H U w = TT , Wl = H[ and Q = = {q : (q,E) G G}. 

Let a be a bijection between the abstract data of v and the data that occur in 
G, which is bundle preserving (i.e., whenever a(S, i) — D, we have that S is the 
bundle of D in G), and if Q = is nonempty then a{Q = , 1) = E. 
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— To show (Ilia), for each abstract datum (S,i) of v and both d G {0,1}, take 
R'(S,i) d and R d to be the bundle of E in Hd, and take R'(S,i)^ to be the 
bundle of a(S, i) in Hd- 

— For (Illb), if Q = is empty then take E to be an arbitrary datum which does not 
occur in G, pick the same quadruples for the threads in G as for the corresponding 
(via a) abstract threads of v, and for both d G {0,1}, obtain Hd from Wd by 
replacing each set of abstract data (S 1 , 1), . . . , (S 1 , Wd{S')) with: the data a(S, i) 
such that (S, i) ^ (Q=, 1) and R'(S, i) l A = S', the datum E if Rj = S', and fresh 
further data if the inequality in (ii) is strict. 

Composing abstract transitions gives us abstract runs of A. Such a run on a 
finite tree (N, S, A) is a mapping n4u„ from the nodes to abstract configurations 
such that, for each nonlcaf n, there exists Q= with v n — *J? n \ v n .Q,v n .\, and if n is 
the root then q t G Q = . Defining the run to be accepting iff v n (S) = for all leaves 
n and all S % F, we have (I) above by (Ilia) and (Illb). 

We are now ready to define C4, as an ITCA that performs the steps (l)-(9) 
below. States of C4 are used for control and for storing a, Q-, root (initially tt), 
S, R% R'$, R'\, R'f, q, R^, R$, R\, R(, d, 1 and Rj. There arc 21^1 - 1 counters 
in the array c, and 2'^' counters in the array c'. The steps are implemented by 
^-transitions, except for the a-transition in (4). The choices are nondeterministic. 
If a choice in (3.2) is impossible, or a check in (2), (3.2) or (5) fails, then C_4 blocks. 

The steps (l)-(9) guess and check an accepting abstract run of A on a finite tree. 
The counter array c is used to store abstract configurations, and the counter array 
d is auxiliary. The initial condition in the definition of abstract runs is checked 
in (2), the final condition in (8), and steps (3)-(7) are essentially a reformulation 
of the definition of abstract transitions. This particular reformulation is tailored 
for a development in the proof of Theorem 4.1, and is based on observing that the 
quadruples of sets R'(S,i)i,R'(S,i)^,R'(S,i)\,R'(S,i)} for abstract data {S, i) ^ 
(Q=, 1) do not need to be stored simultaneously, i.e. that it suffices to store numbers 
of such identical quadruples, which is done using the counter array d . 

(1) Choose a G S, and Q = with cither c[Q = ] > or Q = = 0. 

(2) If root — tt, then check that qi G Q— and set root := ff. 

(3) For each nonempty SCQ, while c[S] > do: 

(3.1) choose R'h,R'$,R'\,R'{ C Q- 

(3.2) for each q e S, choose R^,R^,R\,R( \= 5(q,a,{S,c[S\) = (Q=,l», and 
check that R d C R' d for both d G {0, 1} and ? G {|, l}\ 

(3.3) decrement c[S], and if (S,c[S\) = (Q=,0), then choose i?J D R' l d U R'$ 
for both d G {0, 1}, else increment c'[R'^ R'$,R'{, R'(}. 

(4) Perform an a-transition, forking with d := and d := 1. 

(5) Check that i?J D {Ji R 'd : c '[R't R ' L R 't R ' 1} > °}: and increment c[Rj}. 

(6) Transfer each d[R'},,R'$,R'\,R'£] with nonempty R! l d to c\R! l d \. 

(7) Reset (i.e. decrement until 0) each d[R /l , R'$, R'\, R'f] with empty R'j. 

(8) If c[S] — whenever S % F, then pass through a final state. 

(9) Repeat from (1). 
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Since C4 is an ITCA, its runs may contain arbitrary errors that increase one 
or more counters. Nevertheless, between executions of steps (3)-(7) by C4 and 
abstract transitions of A, we have the following two-part correspondence. It shows 
that the possibly erroneous executions of (3)-(7) match the abstract transitions with 
the slack allowed by their definition, which in turn match the concrete transitions 
with their possible introductions of junk threads (cf. (Ilia), (IHb) and Remark 2.1). 

(IVa) Whenever v — >®= wo,wi, we have that C4 can perform steps (3)-(7) be- 
ginning with any configuration such that each c[S] has value v(S) and each 
J[R'k,R'$,R'\,R'[] has value 0, so that for both forks d £ {0,1} in (4), 
the ending configuration is such that each c[S] has value Wd(S) and each 
c'[R'^ R'$, R'\, R'(] has value 0. 

(IVb) Whenever C4 can perform steps (3)-(7) beginning with a configuration such 
that a and Q = are as in (1) and each c[S] has value v(S), so that for both 
forks d G {0, 1} in (4), the ending configuration is such that each c[S] has 
value Wd(S), we have v — >® = wo,wi. 

— In proving (IVa), we can choose where incrementing errors occur. For each iter- 
ation of (3.1)-(3.3), let the quadruple chosen in (3.1) be 

R'(S, c[S])lR'(S, c[S})$, R'(S, c[S]){,R'(S, c[S}){ 

so that (3.2) can succeed by (i) in the definition of abstract transitions. It remains 
to match by incrementing errors, say at the end of (7), any differences between 
the two sides of the inequalities in (ii). 

—To obtain (IVb), let R'(S, R'(S, i)* , R'(S, R'(S, i)( for each abstract da- 
tum (S, i) of v be the quadruple chosen in the last performance of (3.1) with 
i = c[S] (due to incrementing errors, there may be more than one). Step (3.2) 
ensures that (i) is satisfied. Since at the end of (3), each c'[R'q, R'q , R'\, R'(] 
has value at least 

\{(S,i) : (S,i) * (Q=,l) A Vd,?(R'(S,i)l = R'l)}\ 

steps (5) and (6) ensure that (ii) is satisfied. 

Now, we have (II) above. The 'if direction follows by (IVa), and the 'only if 
direction by (IVb) once we observe that, without loss of generality, we can consider 
only runs of C4 that do not contain incrementing errors on the array c outside of 
steps (3)-(7) except before the first performance of (1). 

To conclude that polynomial space suffices for computing C4, we observe that 
each of its state variables is either from a fixed finite set, or an element of S, 
or an element or subset of Q, and that deciding satisfaction of transition formulae 
S(q, a, (S, c[S]) = (Q=, 1)) in step (3.2) amounts to evaluating Boolean formulae. □ 

We remark that, in the opposite direction to the translation in the proof of 
Theorem 3.1, by extending the proof of [Dcmri and Lazic 2009, Theorem 5.2] to 
trees, for each ITCA C, an ATRAi Ac is computable in logarithmic space such 
that \J tn (Ac) consists of encodings of accepting runs of C. Moreover, similarly 
as on words, the two translations can be extended to infinite trees, where ATRAi 
are equipped with weak acceptance and ITCA with Biichi acceptance. Instead of 
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decidable and not primitive recursive as on finite trees, nonemptiness for those two 
classes of automata can then be shown co-r.e. -complete. 

4. SAFETY AUTOMATA 

We now show decidability of nonemptiness of forward alternating tree 1-register 
automata with safety acceptance over finite or infinite data trees. More precisely, 
since the class of safety ATRAi is not closed under complement, but is closed 
under intersection and union (cf. Proposition 2.2(b)), we show decidability of the 
inclusion problem, which implies decidability of nonemptiness of Boolean combi- 
nations of safety ATRAi. However, already for the subproblems of nonemptiness 
and nonuniversality, we obtain non-elementary and non-primitive recursive lower 
bounds (respectively). 

Theorem 4.1. For safety ATRAi, inclusion is decidable, nonemptiness is not 
elementary, and nonuniversality is not primitive recursive. 

Proof. Showing that the inclusion problem is decidable will involve extending: 

— the proof of Proposition 2.2 to obtain an intersection of a safety and a co-safety 
ATRAi, which can be seen as a weak parity ATRAi with 2 priorities; 

— the proof of Theorem 3.1 to obtain an ITCA with a more powerful set of instruc- 
tions and no cycles of £-transitions, which can also be seen as having weak parity 
acceptance with 2 priorities; 

— the proof of Theorem 2.5 to obtain decidability of nonemptiness of such ITCA. 

To maintain focus, we shall avoid introducing the extended notions in general, but 
concentrate on what is necessary for this part of the proof. 

Suppose A\ = (Y,,Qi,qj,Fi,5i) and Ai = (£, Qi, 9/ , F 2 , 62) are ATRAi, where 
we need to determine whether L sa ^(.Ai) is a subset of L sa f(A 2 ). By the proof of 
Proposition 2.2(b), that amounts to emptiness of the intersection of L sa -^(^li) and 
L cos (_4 2 ), where A2 — (S, Q2, qj, F2, 82) is the dual automaton to A 2 . Assuming 
that Qi and Q2 are disjoint, and do not contain , let 

An = (X,{q?}uQ 1 UQ2,q?,F 1 UF 2 ,5 n ) 

be the automaton for the intersection of Ai and _4 2 : 

!8(qj,a,p) A6(qj,a,p), if q = q? 
8i(q,a,p) , if <7 6 Qi 

8 2 (q,a,p), iiqeQ 2 

We then have: 

(*) A data tree r with alphabet £ is safety-accepted by Ai and co-safety-accepted 
by A2 iff An has a run on r which is final and Q 2 -finite, i.e. there exists / such 
that the configuration at each node of length at least I contains no threads with 
states from Q 2 . 

Before proceeding, let incrementing tree counter automata with nondeterminis- 
tic transfers (shortly, ITCANT) be defined as ITCA (cf. Section 2.3), except that 
(if z, c) instructions are replaced by (transf , c, C) for counters c and sets of coun- 
ters C. Such an instruction is equivalent to a loop which executes while c is nonzero, 
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and in each iteration, decrements c and increments some counter in C . However, in 
presence of incrementing errors, the loop may not terminate, whereas (transf , c, C) 
instructions are considered atomic. The effect of (transf, c, C) is therefore to trans- 
fer the value of c to the counters in C, among which it is split nondctcrministically. 
In particular, (ifz, c) instructions can be reintroduced as (transf , c, 0). 

Now, steps (l)-(9) in the proof of Theorem 3.1 can be implemented by an IT- 
CANT which uses nondeterministic transfers instead of the loops in (3), (6) and (7), 
and whose transition relation therefore contains no cycles of e-transitions. More 
specifically, each reset in (7) can be implemented as a transfer to a new auxil- 
iary counter c", (6) already consists of transfers to single counters, and (3) can be 
replaced by the following two steps: 

(3a) If Q = ^ 0, then decrement c[Q=] and choose Rq , Rf C Q such that, for each 
qeQ=, there exist R^,R^,R\, r( \= 5(q, a, tt) with R J D R l d U R l d for both 
de{0,l}. 

(3b) Transfer each c[S] nondeterministically to the set of all c'[R'q,R'q,R'\,R'(] 
such that, for each q e S, there exist Rq,Rq,R\,r( \= S(q,a,ff) with R' d C 
R'l for both d G {0, 1} and ? 6 {|, I}. 

Let Cn be such an ITCANT for An , which in addition performs the following step 
between (7) and (8), where prop is a state variable, initially ff: 

(7±) If c[S] = whenever S n Q 2 + 0, then set prop := tt. 

As in the proof of Theorem 3.1, we have that Cn is computable from An, and 
therefore from Ai and Ai, in polynomial space. Also, An satisfies (Ilia) and (Illb), 
and An and C n satisfy (IVa) and (IVb). Recalling that C n contains no cycles of 
^-transitions, we infer the following from (*) above, where the notion of transitions 
between levels of C n is as in the proof of Theorem 2.5, and P denotes the set of all 
states of Cn in which prop has value tt: 

(**) L sa -^(^l 1 ) is a subset of L sa ^(^l2) iff there does not exist an infinite sequence 
of transitions Go — > Q\ — >• • • • which is from the initial level of C n and such 
that some Qi contains only states from P. 

To conclude decidability of inclusion, we show that, given an ITCANT C n and 
a set P of its states, existence of an infinite sequence of transitions as in (**) is 
decidable. For a set G of levels of C n , we write fG to denote its upward closure 
with respect to <: the set of all H for which there exists Q € G with Q < H. We 
say that G is upwards closed iff G = fG, and we say that HI is a basis for G iff 
G = yM. As in the proof of Theorem 2.5, we have that successor sets with re- 
spect to — ► are computable, ^ is a well-quasi-ordering, ^ is strongly (in particular, 
rcflexively) downward-compatible with — >, and ^ is decidable. Hence, by [Finkcl 
and Schnoebelen 2001, Proposition 5.4], a finite basis G_r of the upward closure of 
the set of all levels reachable from the initial level is computable. By the strong 
downward compatibility, the set of all levels from which there exists an infinite se- 
quence of transitions is downwards closed, so its complement is upwards closed. We 
claim that a finite basis Gt of the latter set is computable. With that assumption, 
since a finite basis Gat of the set of all levels that contain some state not from P is 
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certainly computable, we are done because there does not exist an infinite sequence 
of transitions as in (**) iff '[Gr is a subset of the union of fG-r and pGjv- 
It remains to establish the claim. For a finite set G' of levels of C n , let 

K(G') = 1 + max max V v(c) 

where k is the number of counters of C n . Let also Predy(G') be the upwards-closed 
set consisting of all Q such that, whenever Q — > Q' , we have Q' e |G'. Observe 
that, whenever Q € Predv(G'), there exists Q\ £ Predy(G') such that Q\ <Q and, 
for each (q, v) G and c 6 {l,...,fc}, v(c) < K(G'). Hence, a finite basis of 
Predy(G') is computable, so the following is an effective procedure: 

(i) Begin with G T := 0. 

(ii) Let H be a finite basis of Prcdy(GT)- 

(iii) If H % ^Gt-, then set Gt '■= Gt U H and repeat from (ii), else terminate. 

Since ^ is a well-quasi-ordering, the procedure terminates and computes a basis of 
the set of all levels from which every sequence of transitions is finite, as required. 

We shall establish that nonemptiness of safety ATRAi is not elementary by a two- 
stage reduction, which separates dealing with the inability of one-way alternating 
1-register automata to detect incrementing errors in encodings of computations of 
counter machines, from ensuring acceptance only of encodings of computations in 
which counters are bounded by a tower of exponentiations. More precisely, we shall 
use the following problem as intermediary. The notation 2 ff- m is as in Example 2.4. 

(***) Given a deterministic counter machine C and to > 1 in unary, does C have 
a computation which possibly contains incrementing errors, in which every 
counter value is at most 2 to, and which is either halting or infinite? 

Such a machine is a tuple (Q,qi,qjj,k,S) where: Q is a finite set of states, qi is 
the initial state, qn is the halting state, k e N is the number of counters, and 
6 : Q \ {<Zh} — > {1, • ■ • , k} x (Q U Q 2 ) is a transition function. Thus, from a 
state q ^ qn, cither S(q) is of the form (c,q'), which means that the machine 
increments c and goes to q', or S(q) is of the form (c,q',q"), which means that, if 
c is zero, then the machine goes to q' , else it decrements c and goes to q" . More 
precisely, a configuration is a state together with a counter valuation, and we write 
(q, v) — > (q 1 , v') iff, for some > v and v'^ < v', 

— either S(q) = (c, q') and v'^ = [c n> v^(c) + 1], 
—or S(q) = (c,q',q"), iy(c) = and v' v = ty, 
—or S(q) = (c, q", q') and v' v = v^[c ^ v^(c) - 1]. 

We say that the transition is error- free iff the above holds with v^j = v and v'^ = v' . 
A computation is a sequence (qo,Vo) — > (qi, v\) — > ■ ■ ■ such that qo = qi and v = 0. 

To show that (***) is not elementary, we reduce from the problem of whether a 
deterministic 2-counter machine of size m has an error-free halting computation of 
length at most 2 to. Given such a machine C whose counters are c\ and C2, let 
C be a deterministic machine with counters c\, C2, cT, cj, c^, c' , c" and c'", which 
performs the following and then halts: 
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c' := rrt; inc(c7); 
while c' > 

{ dec(c'); while ~cl > { dec(c7); inc(c") }; inc(ci); 
while c" > Fi S- 3 - Computing 2 f m 

{ dec(c"); while cl > { dec(c7); inc(c"') }; 
while c'" > { dec(c"'); inc(cl); inc(cT) } } } 



(I) For both i <G {1, 2}, set cl to 2 ft to by executing the pseudo-code in Figure 3. 
The loops over c', c" and d" implement cl := 2 ft c', c7 := 2 C and c7 := 2 x c'" 
(respectively) . 

(II) Simulate C using ci and C2, and after each step: 
— increment c 1 ; 

— if Ci has been incremented, then decrement £7; 
— if Ci has been decremented, then increment c~; 
— if C has halted, then go to (III). 

(III) For both i e {1,2}, transfer cl to a. 

Observe that C is computable in space logarithmic in to. If C has an error-free 
halting computation of length at most 2 ft to, running C without errors indeed halts 
and does not involve counter values greater than 2 ft to. For the converse, suppose 
C has a computation which possibly contains incrementing errors, in which every 
counter value is at most 2 ft to, and which is either halting or infinite. By the 
construction of C and the boundedness of counter values, the computation cannot 
be infinite, so it is halting. Since cT and 02 were set to 2 ft m by stage (I), and since 
stage (III) terminated, the halting computation of C in stage (II) must have been 
error- free and it is certainly of length at most 2 ft to. 

To reduce from (***) to nonemptiness of safety ATRAi, consider a deterministic 
counter machine C = (Q,qi,qH,k,S) and to > 1. We can assume that q' / q" 
whenever S(q) — (c,q',q"). By the proof of [Demri and Lazic 2009, Theorem 5.2], 
which uses essentially the same encoding of computations of counter machines into 
data words as in the proof of [Bojaficzyk et al. 2006, Theorem 14], we have that 
an ATRAi -Ac with alphabet Q is computable in space logarithmic in \C\, such 
that it safety-accepts a data tree r iff the left- most path in r (i.e., the sequence of 
nodes obtained by starting from the root and repeatedly taking the left-hand child) 
satisfies the following: 

— the letter of the first node is qi, and either the letter of the last nonleaf node is 
qn or the sequence is infinite; 

— for all letters q and q' of two consecutive nodes n and n' (respectively), 
— either 5(q) is of the form {c,q') and we say that n is c-decrementing, 
— or S(q) is of the form (c, q' ', q") and we say that n is c-zero-testing, 
— or S(q) is of the form (c, q" , q') and we say that n is c-decrementing; 

— for each counter c, no two c-incrementing nodes are labelled by the same datum, 
no two c-decrementing nodes are labelled by the same datum, and whenever a c- 
incrementing node n is followed by a c-zero-testing node n', then a c-decrementing 
node with the same datum as n must occur between n and n'. 
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Hence, by taking the left-most paths in data trees that are safety-accepted by Ac 
and erasing data, we obtain exactly the sequences of states of halting or infinite 
computations of C which possibly contain incrementing errors. Assuming that 61, 
. . . , b m , * are not in Q, to restrict further to computations of C in which every 
counter value is at most 2 ft m, it suffices to strengthen Ac to obtain a safety 
ATRAi A 2 C with alphabet Q U {61, . . . , b m , *} which requires that: 

— whenever a node n in the left-most path is c-incrementing, then the automaton 
B, n from Example 2.4 safety-accepts the right-hand subtree at n; 

— whenever a node n in the left-most path is c-incrementing, n' is either n or 
a subsequent c-incrementing node, and no c-decrementing node with the same 
datum as n occurs between n and n' , then the right-hand subtree at n' contains 
a node with letter b m and the same datum as n. 

Finally, that nonuniversality of safety ATRAi is not primitive recursive follows 
from the same lower bound for nonuniversality of safety one-way alternating au- 
tomata with 1 register over data words [Lazic 2006] . □ 

5. XPATH SATISFIABILITY 

In this section, we first describe how XML documents and DTDs can be represented 
by data trees and tree automata. We then introduce a forward fragment of XPath, 
and a safety subfragment. By translating XPath queries to forward alternating 
tree 1-register automata, and applying results from Sections 3 and 4, we obtain 
decidability of satisfiablity for forward XPath on finite documents and for safety 
forward XPath on finite or infinite documents. 

XML Trees. Suppose £ is a finite set of element types, £' is a finite set of 
attribute names, and £ and £' are disjoint. An XML document [Bray et al. 1998] 
is an unranked ordered tree whose every node n is labelled by some type(n) G £ 
and by a datum for each element of some atts(n) C £'. Motivated by processing 
of XML streams (cf., e.g., [Olteanu et al. 2004]), we do not restrict our attention 
to finite XML documents. 

Concerning the data in XML documents, we shall consider only the equality pred- 
icate between data labels. Equality comparisons with constants are straightforward 
to encode using additional attribute names. Therefore, similarly as Bojahczyk et al. 
[2009] , we represent an XML document by a data tree with alphabet SUE', where 
each node n is represented by a sequence of 1 + |atts(n)| nodes: the first node 
is labelled by type(n), the labels of the following nodes enumerate atts(n), the 
children of the last node represent the first child and the next sibling of n (if any), 
and for each preceding node in the sequence, its left-hand child is the next node 
and its right-hand child is a leaf. We say that such a data tree is an XML tree. 

Following Bcncdikt et al. [2008] and Bojahczyk et al. [2009], we assume without 
loss of generality that document type definitions (DTDs) [Bray et al. 1998] are 
given as regular tree languages. More precisely, we consider a DTD to be a forward 
nondctcrministic tree automaton T with alphabet £u£' and without e-transitions. 
Such automata can be defined by omitting counters and e-transitions from ITCA 
(cf. Section 2.3). Infinite trees are processed in safety mode, i.e. the condition that 
an infinite run of T has to satisfy to be accepting is the same as for finite runs: for 
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/ i def / 

r, n, n p e n = n 

T,n,n |={V, A,>,<} n{V, A, >, <}n 

r, n, n' \= {V*, A*, >*, <*} <5> n{V*, A*, >*, <*}«' 

r, n, n' |= pi /p2 ^ there exists n" such that 

r, n, n" |= pi and t, n", n' |= P2 

T, n, n' |= pi U P2 r, n, n' \= pi or r, n, n' \= p2 

r, n, n' \= p[u] T,n,n' \= p and r, n' |= u 

t, n |= p? there exists n' such that r, n,n' \= p 

r,n \= a 4$ A(n) = a 

T,n \= pi/@a'i txi p2 /&a' 2 & there exist n\ , ki , ri2 , &2 such that 

T,n,m |= pi, fei < |atts(ni)|, A(m ■ fel ) = a' 1: 
t, n, n 2 \= P2,k 2 < |atts(n 2 )|, A(n 2 ■ fc2 ) = a' 2 , 
A(rii-0 fcl )txiA(n2-0 fe2 ) 



Fig. 4. Semantics of Queries and Qualifiers 



each leaf n, the state of the configuration at n is final. An XML tree r as above is 
regarded to satisfy T iff T accepts tree(r). 

Fragments of XPath. The fragment of XPath [Clark and DeRosc 1999] below 
contains all operators commonly found in practice and was considered in [Bencdikt 
ct al. 2008; Geerts and Fan 2005]. The grammars of queries p and qualifiers u are 
mutually recursive. The element types a and attribute names a' range over E and 
S', respectively. 

p ::= £ | V | A | > | < | V* | A* | >* | <* \p/p \pUp \p[u] 

u ::= -iti | m A u | pi \ a \ p/@a! = p/@a' \ p/@a' ^ p/@a' 

We say that a query or qualifier is forward iff: 

— it does not contain A, <, A* or <*; 

— for every subqualificr of the form pi/@a[ x p 2 /@a' 2 , we have that p\ = e and 
that P2 is of the form e or V/p' 2 or r>/p^. 

A safety (resp., co-safety) query or qualifier is one in which each occurence of V, 
V* or [>* is under an odd (resp., even) number of negations. Since infinite XML 
documents may contain infinite siblinghoods, V, V* and [>* are exactly the queries 
that may require existence of a node which can be unboundedly far. 

The semantics of queries and qualifiers is standard (cf., e.g., [Gccrts and Fan 
2005]). We write the satisfaction relations as T,n,n' |= p and r, n \= u, where r is 
an XML tree (N, SUE', A, A), and n and n' are E-labelled nodes. The definition is 
recursive over the grammars of queries and qualifiers, and can be found in Figure 4. 
We omit the Boolean cases, and we write V, A, > and < for the relations between 
E-labelled nodes that correspond to the child, parent, next-sibling and previous- 
sibling relations (respectively) in the document that r represents. 

We say that r satisfies p iff r, e, n' \= p for some n'. 

ACM Transactions on Computational Logic, Vol. V, No. N, Month 20YY. 



20 • M. Jurdziriski and R. Lazic 



Example 5.1. Suppose o^a^ £ E'. The forward query p a / a ^ = >*/V*[e/@a' 1 = 
(v/V*)/@a 2 ] is satisfied by E-labelled nodes no and n\ iff not>*V*rii and there 
exists n<i such that niV + 7i2 and the value of attribute a\ at n\ is equal to the value 
of attribute a' 2 at n 2 . Hence, the safety forward query e[ _, (p a ' 1 ,a( ! ?)] is satisfied by 
an XML tree over E and E' (whose root may have younger siblings) iff the value of 
a[ at a node is never equal to the value of a' 2 at a descendant. 

Suppose a query p and a DTD T are over the same element types and attribute 
names. We say that p is satisfiable relative to T iff there exists an XML tree which 
satisfies p and T. Finitary satisfiability restricts to finite XML trees. 

Complexity of Satisfiability. Let us regard a forward qualifier u over element 
types E and attribute names E' as finitely equivalent to an ATRAi A with alphabet 
SUE' iff, for every finite XML tree r over E and E', and E-labelled node n, we 
have T,n \= u iff A accepts the subtree of r rooted at n. For safety (resp., co-safety) 
u, safety (resp., co-safety) equivalence is defined by also considering infinite XML 
trees and safety (resp., co-safety) acceptance by A. 

To formalise the corresponding notions for queries, we introduce the following 
kind of automata "with holes" . Query automata are defined in the same way as 
ATRAi (cf. Section 2.2), except that: 

— transition formulae may contain a new atomic formula H; 

— no path in the successor graph from the initial state to a state q such that H 
occurs in some transition formula at q may contain an update edge. 

The vertices of the successor graph are all states, there is an edge from q to r iff 
r(0,l), r(0, 1), r(l,i) or r(l, I) occurs in some transition formula at q, and such 
an edge is called update iff r(0, 1) or r(l, 1) occurs in some transition formula at q. 

To define a run of a query automaton on a data tree r with the same alphabet 
and with respect to a set of nodes N', we augment the definition of runs of ATRAi 
so that whenever a transition formula is evaluated at a node n, each occurence of 
H is treated as T if n G N' , and as _L if n £ N' . Acceptance of a finite data tree, 
safety acceptance, and co-safety acceptance, all with respect to a set of nodes for 
interpreting H, are then defined as for ATRAi. 

For a query automaton A and an ATRAi or query automaton A' with the same 
alphabet and initial states qi and q' T (respectively), we define the substitution of A' 
for the hole in A by forming a disjoint union of A and A' , taking qj as the initial 
state, and substituting each occurence of H in each transition formula S(q, a, b) of A 
by S(q'j, a, b). Observe that the unreachability in A of H from qi by a path with an 
update edge means that the composite automaton transmits initial register values 
to A' without changes. 

Now, we say that a forward query p over element types E and attribute names 
E' is finitely equivalent to a query automaton B with alphabet SUE' iff, for every 
finite XML tree r over E and E', E-labelled node n, and set N' of E-labelled 
nodes, we have r, n, n' \= p for some n' G N' iff B accepts the subtree of r rooted 
at n with respect to N' . For safety (resp., co-safety) p, safety (resp., co-safety) 
equivalence is defined by also considering infinite XML trees and safety (resp., co- 
safety) acceptance by B. 
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Theorem 5.2. For each forward query p (resp., forward qualifier u) overT, and 
E', a finitely equivalent query automaton Bp ,s (resp., ATRAi »4^ ,E ) is computable 
in logarithmic space. If p (resp., u) is safety, then it is safety equivalent to Bp' S 
(resp., A^'). 

Proof. The translations are defined recursively over the grammars of queries 
and qualifiers: 

— £> E ' S , £>v ' E ; ^>' S i <6v* S > £>>* S and A^ ' S are straightforward to define; 

— Bpup' 1S f° rmc d from and i3p/' S by disjunctive disjoint union, A^^ is 

formed from A^'^ by dualisation, and A^'^ u , is formed from -4^' E and -4^/ S 
by conjunctive disjoint union (cf. the proof of Proposition 2.2); 

— to obtain B^'P, , we substitute B^,'^ for the hole in Bp ,E ; 

— to obtain , we substitute a conjunctive disjoint union of Bf'^' and -4 S ' E ' 

for the hole in Bp ' S '; 
— -4p? S is formed from Bp ' S by substituting T for H; 

— an automaton for e/©^ = (V /p)/@a' 2 is formed by substituting the second 
automaton depicted in Figure 5 (cf. Example 2.4 for depicting conventions) for 
the hole in Bp ' S , and substituting the result for the hole in the first automaton 
depicted in Figure 5; 

— the remaining cases in the grammar of qualifiers are handled similarly. 

The required equivalences, as well as that if p (resp., u) is co-safety then it is co- 
safety equivalent to Bp ,s (resp., A^'^ ), are shown simultaneously by induction. □ 

Theorem 5.3. (a) For forward XPath and arbitrary DTDs, satisfiability over 
finite XML trees is decidable. 
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(b) For safety forward XPath and arbitrary DTDs, satisfiability over finite or infi- 
nite XML trees is decidable. 

PROOF. Given a forward query p and a DTD T over element types E and at- 
tribute names E', by Theorem 5.2, an ATRAi A^.'? is computable, which is finitely 

equivalent to the qualifier pi. We can then compute an ITCA C(A^~' ) as in the 
proof of Theorem 3.1, which recognises exactly trees obtained by erasing data from 
finite XML trees that satisfy p. To conclude (a), we observe that ITCA are closed 
(in logarithmic space) under intersections with forward nondeterministic tree au- 
tomata, and apply Theorem 2.5. 

For (b), supposing that p is safety, by Theorem 5.2 again, an ATRAi -4p? S is 
computable, which is safety equivalent to the qualifier pi. Applying the proof of 
Theorem 4.1 to A^f" and an ATRAi whose safety language is empty, we can com- 
pute an ITCANT C'{A^.f' ), which contains no cycles of e-transitions and recognises 
exactly trees obtained by erasing data from finite or infinite XML trees that satisfy 
p. It remains to observe that ITCANT with no cycles of e-transitions are closed 
(in logarithmic space) under intersections with forward nondeterministic tree au- 
tomata, and to recall that their nonemptiness was shown decidable also in the proof 
of Theorem 4.1. □ 

We remark that, by the proof of [Demri and Lazic 2009, Theorem 5.2], finitary 
satisfiability for forward XPath with DTDs is not primitive recursive, even without 
sibling axes (i.e., > and >*). 

6. CONCLUDING REMARKS 

It would be interesting to know more about the complexities of nonemptiness for 
safety ATRAi and satisfiability for safety forward XPath with DTDs. By Theo- 
rem 4.1, the former is decidable and not elementary, and by Theorem 5.3(b), the 
latter is decidable. 
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